A Lyapunov-Based Methodology for Constrained Optimization with Bandit Feedback

نویسندگان

چکیده

In a wide variety of applications including online advertising, contractual hiring, and wireless scheduling, the controller is constrained by stringent budget constraint on available resources, which are consumed in random amount each action, stochastic feasibility that may impose important operational limitations decision-making. this work, we consider general model to address such problems, where action returns reward, cost, penalty from an unknown joint distribution, decision-maker aims maximize total reward under B cost time-average penalty. We propose novel low-complexity algorithm based Lyapunov optimization methodology, named LyOn, prove for K arms it achieves square root KBlog(B) regret zero constraint-violation when sufficiently large. The low computational sharp performance bounds LyOn suggest Lyapunov-based design methodology can be effective solving bandit problems.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stochastic convex optimization with bandit feedback

This paper addresses the problem of minimizing a convex, Lipschitz function f over a convex, compact set X under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f(x) at any query point x ∈ X . The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm’s query points...

متن کامل

Towards Minimax Policies for Online Linear Optimization with Bandit Feedback

We address the online linear optimization problem with bandit feedback. Our contribution is twofold. First, we provide an algorithm (based on exponential weights) with a regret of order √ dn logN for any finite action set with N actions, under the assumption that the instantaneous loss is bounded by 1. This shaves off an extraneous √ d factor compared to previous works, and gives a regret bound...

متن کامل

A Robust Knapsack Based Constrained Portfolio Optimization

Many portfolio optimization problems deal with allocation of assets which carry a relatively high market price. Therefore, it is necessary to determine the integer value of assets when we deal with portfolio optimization. In addition, one of the main concerns with most portfolio optimization is associated with the type of constraints considered in different models. In many cases, the resulted p...

متن کامل

Online Stochastic Optimization under Correlated Bandit Feedback

In this paper we consider the problem of online stochastic optimization of a locally smooth function under bandit feedback. We introduce the high-confidence tree (HCT) algorithm, a novel any-time X -armed bandit algorithm, and derive regret bounds matching the performance of existing state-of-the-art in terms of dependency on number of steps and smoothness factor. The main advantage of HCT is t...

متن کامل

Stochastic Linear Optimization under Bandit Feedback

In the classical stochastic k-armed bandit problem, in each of a sequence of rounds, a decision maker chooses one of k arms and incurs a cost chosen from an unknown distribution associated with that arm. In the linear optimization analog of this problem, rather than finitely many arms, the decision set is a compact subset of R and the cost of each decision is just the evaluation of a randomly c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2022

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v36i4.20285